Towards Q-learning the whittle index for restless bandits

J Fu; Y Nazarathy; S Moka; PG Taylor

Conference Proceedings

Towards Q-learning the whittle index for restless bandits

J Fu, Y Nazarathy, S Moka, PG Taylor

2019 Australian and New Zealand Control Conference Anzcc 2019 | IEEE | Published : 2019

DOI: 10.1109/ANZCC47194.2019.8945748

Abstract

We consider the multi-armed restless bandit problem (RMABP) with an infinite horizon average cost objective. Each arm of the RMABP is associated with a Markov process that operates in two modes: active and passive. At each time slot a controller needs to designate a subset of the arms to be active, of which the associated processes will evolve differently from the passive case. Treated as an optimal control problem, the optimal solution of the RMABP is known to be computationally intractable. In many cases, the Whittle index policy achieves near optimal performance and can be tractably found. Nevertheless, computation of the Whittle indices requires knowledge of the transition matrices of th..

View full abstract

University of Melbourne Researchers

Peter Taylor Author

Related Projects (1)

NEW STOCHASTIC MODELS FOR SCIENCE, ECONOMICS, SOCIAL SCIENCE AND ENGINEERING

Stochastic, or random, phenomena abound in society. This project will combine advancement of the theory of stochastic models at a deep leve..

Grants

Awarded by Australian Research Council

Funding Acknowledgements

J. Fu and P.G. Taylor's research is supported by the Australian Research Council (ARC) Laureate Fellowship FL130100039 and the ARC Centre of Excellence for the Mathematical and Statistical Frontiers (ACEMS). S. Moka's research is supported by ACEMS, under grant number CE140100049. Y. Nazarathy's research is supported by ARC grant DP180101602. The authors also thank Prof. Vivek Borkar for preliminary discussions.